Development of a conversational telephone speech recognizer for Levantine Arabic

نویسندگان

  • Dimitra Vergyri
  • Katrin Kirchhoff
  • Venkata Ramana Rao Gadde
  • Andreas Stolcke
  • Jing Zheng
چکیده

Many languages, including Arabic, are characterized by a wide variety of different dialects that often differ strongly from each other. When developing speech technology for dialect-rich languages, the portability and reusability of data, algorithms, and system components becomes extremely important. In this paper, we describe the development of a large-vocabulary speech recognition system for Levantine Arabic, which was a new dialectal recognition task for our existing system. We discuss the dialect-specific modeling choices (grapheme vs. phoneme based acoustic models, automatic vowelization techniques, and morphological language models) and investigate to what extent techniques previously tested on other languages are portable to the present task. We present stateof-the-art recognition results on the 2004 Levantine Arabic Rich Transcription evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing and Using a Pilot Dialectal Arabic Treebank

In this paper, we describe the methodological procedures and issues that emerged from the development of a pilot Levantine Arabic Treebank (LATB) at the Linguistic Data Consortium (LDC) and its use at the Johns Hopkins University (JHU) Center for Language and Speech Processing workshop on Parsing Arabic Dialects (PAD). This pilot, consisting of morphological and syntactic annotation of approxim...

متن کامل

Telephone-based conversational speech recognition in the JUPITER domain

This paper describes our experiences with developing a telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recogniz...

متن کامل

Telephone-based Conversational Speech Recognition in the Jupiter Domain1

This paper describes our experiences with developing a telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recogniz...

متن کامل

Dialectal Arabic Orthography-based Transcription

The present paper describes the experience gained at LDC in the collection and transcription of conversational dialectal Arabic. The paper will cover the following: (a) Arabic language background; (b) objectives. principles, and methodological choices of dialectal Arabic transcription, (c) design features of LDC‟s „Arabic MultiDialectal Transcription Tool‟ (AMADAT) and metalanguage transcriptio...

متن کامل

Real-time telephone-based speech recognition in the Jupiter domain

This paper describes our experiences with developing a realtime telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005